Integration indicators in immigration phenomena. 
A statistical mechanics perspective. 

Adriano Barra * , Pierluigi Contucci \ Rickard Sandell ^, Cecilia Vernia ^ 

* Dipartimento di Fisica, Sapienza Universita di Roma/Dipartimento di Matematica, Universita di Bologna, ■•■Departamento de Historia Economica e Instituciones, Universidad 
de Carlos III de Madrid, and ^ Dipartimento di Scienze Fisiche Informatiche e Matematiche, Universita di Modena e Reggio Emilia 



en 

o 

Oh 
< 



Oh 
I 

O 

o 
o 



> 

(N 

o 
en 



X 



AUTHOR SUMMARY 

Even though the integration of immigrants is a political 
priority in many countries, we have limited knowledge about 
the mechanisms that promote integration. For example, it 
is generally understood that social interaction between im- 
migrants and the native population is a necessary condition 
for immigrant integration. However, it is not clear how sen- 
sitive integration is to an increase in immigrant density, and 
to what extent social interaction translates into higher inte- 
gration. We propose a novel approach to the study of im- 
migrant integration using data analysis methods and mathe- 
matical models inspired by statistical physics. We show that, 
independent of time, integration quantifiers exhibit linear or 
non-linear growth on immigration density, depending on the 
context. We explain these differences by means of a properly 
defined social interaction component, and we illustrate how 
this leads to precise estimates of integration across different 
situations. Disclosing and isolating the mechanism driving 
integration, our framework has the potential to improve our 
ability to formulate more efficient integration policies. 

We study classical integration quantifiers like the percent- 
age of labor contracts, permanent and temporary, given to 
immigrants, mixed marriages, and newborns with mixed par- 
ents. Each quantity is studied as a function of immigrant den- 
sity 7, that is, the ratio between the number of immigrants 
Nimm and the total population of immigrants and natives 
N — Nimm + Nnat- In this context, a natural parameter for 
assessing change in integration quantifiers is the product of 

NrmmN^at OC r(7) := 7(f - 7) , 

since it counts the number of possible cross-group links. By 
analyzing a database on immigration and integration from 
Spain, we find that while the quantifiers measuring labor mar- 
ket integration (green and yellow dots in figure |IJ exhibit lin- 
ear growth in F, the quantifiers capturing the intensity in 
mixed-marriages and the number of newborns with mixed par- 
ents (blue and red dots) display non-linear behavior. That is, 
they take off at a very high growth rate which progressively 
decreases at increasing densities. Hence, if we apply a linear 
theory when interpreting the whole database (grey line), we 
would underestimate the level of integration when the immi- 
grant density is low and overestimate it when it is high. More- 
over, an integration forecast based on a linear theory (black 
line) would lead to predictions that are twice as large as the 
observed values. A finely tuned fit of the blue and red points 
gives a functional shape proportional to \/r starting very close 
to the origin of the axes (blue and red curves in figure nl). We 
proceed by building a theory able to describe all the observed 
data on a unified mathematical framework based on insights 
from statistical mechanics. A labor contract, a marriage, a 
child birth: all are coupling relations among humans. In a 
two-group system such as a society composed of immigrants 
and natives, there can be in-group or cross-group couplings. 
Consequently, the choice between, say, marrying or hiring an 



immigrant over a native is dichotomous. To this end the Dis- 
crete Choice theory proposed by McFadden [1], the success of 
which in modeling human behavior in a variety of situations 
has been widely celebrated, is a natural candidate to describe 
the frequency of cross-group couplings in large populations. 
McFadden's theory contains a crucial assumption of mutual 
independence between the involved random variables. When 
applied to the case we are studying, that theory predicts a lin- 
ear function of F in a suitable interval. Our findings indicate 
first how well McFadden's theory works in assessing the level 
of integration in the Spanish labor market and suggest that 
the choices between assigning the job to a native or to an im- 
migrant are made in a mutually independent fashion, case by 
case, no matter how other actors choose, with little or no peer- 
to-peer mechanism at all. Our results show, moreover, that 
the choice of marrying or having a child with an immigrant 
partner is not well described by the classical discrete choice 
theory. We argue that this is because the latter decisions are 
mostly made after having observed others, particularly people 
we trust, who have made a similar decision and shared their 
personal experience of the outcome of their choice. When this 
is the case, the choices become an act of imitation and are 
no longer independent. Theories that relax the assumption 
of independence and cater to imitation have been introduced 
by Brock and Durlauf [2] and predict a square root behavior 
of the probability of cross-group couplings as observed in the 
marriage and child birth data. The technical part of our work 
identifies a mathematical model capable of dealing with both 
situations for different values of the parameters. The result 
is a generalization of the monomer-dimer model [3] with the 
addition of an imitative interacting social network component 
of small world-type [4] . The model we propose reduces to the 
classical discrete choice theory with linear growth in situations 
when imitation is negligible, and to the square root behavior 
when imitation is strong. The social network structure ex- 
plains why the integration starts at (or very close to) F = 
when the choice is dependent on other actor's behavior. 

In this work we have discovered and quantitatively dis- 
tinguished two types of immigrant integration mechanisms. 
Our results improve our ability to target integration policies 
since they provide an operative method to distinguish whether 
a macro phenomenon such as immigrant integration is the 
product of social action, as in the case of intermarriages and 
newborns with mixed parents, or the product of the common 
action of many people [5], as in the labor market case. For 
example, in the case of labor market integration, policies pro- 
moting immigrant access to the labor market are likely to be 
more effective than policies targeting social aspects of the em- 
ployer/employee relationship. In contrast to mixed marriages 
and newborns with mixed parents, policies targeting social 
interactions between immigrants and natives are likely to be 
more efficient, particularly at high densities of immigration. 
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Fig. 1. Dots are average quantities versus F. Left upper panel; quantifier Jp 
(green dots), tlie fraction of permanent labor contracts given to immigrant on the 
total of labor contracts, with the best linear fit (free fit) aT {a = 1.52 it 0.05, 
goodness of fit R = 0.985). Right upper panel: quantifier Jt (yellow dots), frac- 
tion of temporary contracts given to immigrants, with the best linear fit (free fit) aT 
[a = 1.81 ±0.09, with a goodness of fit li^ = 0.963). Left lower panel: quanti- 
fier Mm (blue dots), fraction of mixed marriages, with the best square root fit (blue 
curve) cVT (c = 0.53 ± 0.02. goodness of fit R^ = 0.992). the best linear free 
fit (grey line) aT {a = 1.18 ±0.07, with a goodness of fit R'^ = 0.855) and the 
best linear extrapolation fit (black line) bT {b = 1.92±0.07, for < T < 0.035, 
goodness of fit R = 0.964). Right lower panel: the quantifier Bm (red dots) 
fraction of newborns with mixed parents, with the best square root fit (red curve) 
C^/T (c = 0.28 ± 0.01, goodness of fit i?^ = 0.984), the best linear free fit 
(grey line) aV {a = 0.64 ± 0.05, goodness of fit ij^ = 0.789) and the best 
linear extrapolation fit (black line) bF {b = 1.04 ± 0.05, for < T < 0.04, 
goodness of fit i?^ = 0.922). 
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ABSTRACT 

Integration of immigrants is a complex socioeconomic phenomenon 
considered difficult to describe, understand, and predict. We address 
the problem of how integration changes with Immigration density, 
and we propose a novel approach to Its study guided by a statistical 
mechanics perspective. More precisely, we focus on studying the de- 
pendence of classical integration quantifiers such as the percentage 
of jobs, temporary and permanent, given to immigrants, mixed mar- 
riages, and newborns with parents of mixed origin on the density of 
immigrants In the population. Analysis of the average data behavior 
shows that while the McFadden discrete choice theory is in excellent 
agreement with the job market quantifiers, the mixed marriages and 
newborns quantifiers behave in accordance with an Imitative theory 
similar to the one Introduced by Brock and Durlauf and suitably ex- 
tended to a monomer-dlmer model with Interacting social network. 
Our findings show that a model that allows for Imitation explains the 
anomalous high growth in the rate of mixed marriages and newborns 
with mixed parents observed at low Immigration densities. Ignoring 
the possibility of imitation would instead underestimate the observed 
quantities by as much as 30% when Immigrant densities are low, and 
overestimate them with a similar error when the densities are high. 
Our method open up the possibility of predicting immigrant Integra- 
tion quantifiers for all the Immigration densities starting from their 
observation at small densities. 

Keywords: immigration phenomena, quantitative sociology, statistical mechanics, col- 
lective behavior 



Introduction 

The United Nations recently reported that there are about one 
biUion migrants worldwide, of which one quarter are interna- 
tional migrants ilj. The size of the migration phenomenon 
and the speed by which it increases have turned migration 
into a challenge that is at the top of the political agenda in 
the European Union, the United States, and in many other 
countries across the world. One reason why migration has 
become a major political priority is that it is a catalyst for 
large-scale social, economic, and demographic changes [^ ca- 
pable of producing opportunities but also turmoil and friction. 

Integration and social cohesion are keywords when ad- 
dressing many of the challenges posed by increasing migration. 
Some of the problems related to such processes are carefully 
analyzed in several studies (for instance [3] and the references 
therein). The European Union in particular has identified 
a list of common basic principles to make integration work 
[U based on employment for immigrant, frequent interactions 
between immigrants and natives, and the like. 

Our work stems from the simple observation that very 
little is known about how integration happens and what are 
the mechanisms that make it work. For example, elemen- 
tary questions like how integration responds to an increase 
in immigration density or to what extent the intensity of in- 
teraction modifies the level of integration still beg coherent 
empirical and theoretical answers. Our study is inspired by 
the realization that precise answers to those questions are of 
paramount importance to formulating social policies able to 
promote integration. 

The research reported here addresses these issues from 
a quantitative point of view and proposes new perspectives 
on data collection, elaboration, and theoretical mathemati- 
cal modeling for their interpretation. More precisely, we ap- 
proach the problem of integration from the hard-sciences point 
of view by relying on methods and techniques from statistical 
mechanics, the branch of theoretical physics devised to ex- 
plain thermodynamic laws as emerging average behavior for 
systems composed of a high number of microscopic interacting 
particles. The application of ideas and methods from statis- 



tical mechanics to fields other than physics has emerged in 
several contexts over the past decades (see [5], [B], [7|, IH], [5] 
and [IQ]). Their development in quantitative sociology is on- 
going [rt], [H], US], [11], [T5], [11], and they have recently 
been used to study immigration phenomena [17], [TSl. 

Our research focuses on classical quantifiers of integration 
such as the fraction of all temporary and permanent labor 
contracts given to immigrants, the fraction of marriages with 
spouses of mixed origin (native and immigrant), and the frac- 
tion of newborns with parents of mixed origin. We aim at 
a predictive theory by which the magnitude of the above- 
mentioned indicators can be expressed as a function of the 
density of immigrants 7, i.e. the ratio between the number of 
immigrants Nimrn and the total population A'' 
where Nnat is the number of natives 
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Since these quantifiers are the sum of many random variables 
divided by their total number, and we are interested in study- 
ing their dependence on 7, what we seek is first the empirical 
law from real data and then the theoretical probability law 
that, in the limit of large numbers, entails the observed col- 
lective behavior. 

To introduce our approach, let us step back to a well- 
understood phenomenon of collective particle behavior like 
those considered in statistical physics, and let say we are in- 
terested in the basic matter of discovering whether those par- 
ticles behave independently or not. In many instances, in 
particular for ferromagnetic particles, the knowledge of their 
collective quantities (e.g. the magnetization) is not enough to 
answer the questions since we could observe the same value 
for systems with or without interaction. Nevertheless, be- 
ing able to control some parameters like the temperature of 
the system or the strength of the imitation coefficient (cou- 
pling constant) among particles or the external magnetic field, 
makes the problem easily solvable by the classical theory of 
magnetism [311 132] . When the interaction is negligible the 
response of the system to a small solicitation is well approx- 
imated by a linear function. If instead the imitation is dom- 
inating, the system would remain insensitive to solicitations 
up to a critical threshold and start responding very quickly 
once that threshold has been exceeded. 

We want to point out that the same problem has emerged 
in a sociological context, as was lucidly formulated by Max 
Weber [191 : Social action is not identical with the similar ac- 
tions of many persons... Thus, if at the beginning of a shower 
a number of people on the street put up their umbrellas at the 
same time, this would not ordinarily be a case of action mu- 
tually oriented to that of each other, but rather of all reacting 
in the same way to the like need of protection from the rain. 

Individuals indeed do behave sometimes independently 
from each other. When that is the case the McFadden Dis- 
crete Choice Theory j20] provides a powerful tool to study 
social behavior as has been demonstrated in the celebrated 
predictive solution to the Bay Area Rapid Transit problem 
and, since then, in a variety of different contexts. Imitative 
or more generally correlated behavior is even more frequent. 
Classical examples are when the actions of others are imi- 
tated because they are fashionable, traditional, or lend so- 
cial distinction |19| . Others have pointed out that individual 
decisions are made according to imitation even in situations 
that were previously not considered |T31 I21j . For example, 
sociologists have argued that imitation, or learning from the 
experiences of others, is a frequent and highly rational behav- 
ior when the consequences, social as well as personal, of one's 
actions are difficult to assess [^ . Therefore both types of be- 



I 



I 



havior are candidates when seeking to explain how integration 
comes about. 

With this in mind, from a conceptual point of view one of 
the main challenges, and perhaps the most intriguing part of 
the research reported here, concerns a problem that in the sta- 
tistical mechanics approach is considered a preliminary step: 
to be able to distinguish using quantitative methods whether 
the value of the integration quantifier follows from people act- 
ing according to some individual preferences independently of 
other people, as in Max Weber's rainfall example, or whether 
it follows as a result of social interaction with others and, of 
course, all the possibilities in between. The two extremes are 
described in statistical mechanics either as free theory- inde- 
pendent particles, perfect gas [231 - or interacting theory with 
possible phase transitions. 

While it is easy to envision formal similarities between 
particle behavior and human behavior, the difficulty in our 
context is that the analogy needs to be strengthened beyond 
the formal level. In the social sciences, in fact, there is no 
natural notion of system temperature or, in other terms, it 
is not clear how to measure a cost function for social actions 
nor what units to use to perform the measurement. Moreover 
there is no simple way to tune the degree of imitation between 
people or the strength of their individual tendency to decide 
on some choice. Our proposal to overcome the obstacle is 
to consider, as control parameter of the Immigrants-Natives 
system, the quantity that tunes the total number of available 
cross-links couplings among the two populations: 



NimmNnat = 7(1 - 7)^^ , 



r(7) = 7(1-7) • 



[2] 



[3] 



By studying the integration quantifiers as functions of immi- 
grant density we show how to determine whether and to what 
extent immigrant integration is the result of social interaction 
or whether it reduces to an outcome brought about by inde- 
pendent choices. What we find is that while the labor market 
integration quantifiers are well described by a free theory of 
independent individual behavior a la McFadden, the integra- 
tion quantifiers on marriages and newborns display the typical 
features of the imitative interacting theories with strong peer- 
to-peer effects. 

Our study shows that it is useful to have a new family of 
mathematical models based on a statistical mechanics exten- 
sion of discrete choice theory, since it offers a set of formal 
tools to systematically analyze and quantify socioeconomic 
situations. If properly applied, our theory can be used to pre- 
dict the level of integration in a two-group social system, like 
in a society with natives and immigrants, at different immi- 
grant densities using only information from observations when 
the immigrant density is low. This in turn can be a valuable 
tool for policy makers. 

The paper is structured in data description, analysis, and 
elaboration, together with the statistical mechanics approach 
for their modelization. The hard-sciences oriented reader may 
find supplementary material in the last section, which is the 
solely technical part. 



Municipality. Hence, in this work Location equals Municipal- 
ity. 

Data on local immigrant densities are compiled as follows. 
We use the size of the immigrant population and the native 
population in each municipality as reported in the 2001 Cen- 
sus as our baseline. Thereafter, we estimate the local im- 
migrant densities for different points in time between 1999 
and 2010. The analysis is based on the information contained 
in the Statistics over residential variation in Spanish munici- 
palitiesHand statistics on vital events (births and deaths) as 
elaborated by Spain's National Statistical Agency (INE). A 
unique feature of the Spanish data is that they also include 
so-called undocumented immigrants, that is, immigrants who 
lack a residence permit ^^. Undocumented immigrants are 
usually not included in official statistical sources. However, 
their share of the immigrant population is often significant, 
and excluding them would underestimate the true size of the 
immigrant population and, most likely, change the nature of 
the studied phenomena. 

Data on marriages and births are drawn from the local 
offices of Vital Records and Statistics across Spain (Registro 
Civil), and have been compounded by the INE. Data on mar- 
riages contain information about the time of the marriage as 
well as the place of birth, nationality, municipality of res- 
idence, and the like, of all the spouses entering into mar- 
riage in Spain. By our definition, a mixed marriage occurs 
when a Spanish-born (native) person marries a person born 
in a foreign country. Similarly, data on births contain infor- 
mation about the place of birth, nationality, municipality of 
residence, among other things, of all the newborns parents. 
For the same reasons as with mixed marriages, we consider 
all newborns with one native and one foreign born parent to 
be newborns with parents of mixed origin. Information is 
included on mixed marriages and newborns with parents of 
mixed origin where the foreign-born spouse or parent is an 
undocumented migrant. We focus our analysis on birth and 
marriage events that occurred during the period 1999 to 2008. 
However, data on density, marriages, and births are subject 
to minor data protection restrictions. An individual residence 
municipality is only disclosed if its population is larger than 
10, 000. For this reason, out of approximately 8, 000 munici- 
palities in Spain, our analysis focuses on only 735. Still, 85% 
of Spanish immigrants reside in the included municipalities. 

Data on labor contracts come from Spain's Continuous 
Sample of Employment Histories (the so called Muestra Con- 
tinua de Vidas Laborales or MCVL). It is an administrative 
data set with longitudinal information for a 4% non-stratified 
random sample of the population who are affiliated with 
Spain's Social Security. Sampling is conducted on a yearly 
basis. We use data from the waves 2005 to 2010|j The data 
contain information on contractual conditions such as whether 
the individuals have a temporary or indefinite labor contract, 
as well as the contracts start and stop times. Residential data 
at the level of municipality and information about place of 
birth are also available. In contrast to the data on densities, 
marriages, and births, for these data the residence municipal- 
ity is only disclosed if the population is larger than 40, 000. 

For those unfamiliar with the Spanish immigration con- 
text, the following brief information may be useful. In 1999, 
Spain received fewer than 50, 000 new documented and un- 



Data Description 

We use unique data from Spain to develop our models and 
test our ideas. We focus on the time interval 1999 to 2010 
because it corresponds to the period in which Spain received 
most of its immigrant population. The smallest geographical 
unit for which data are available is for the administrative unit. 



The so called Estadstica de variaciones residenciales (EVR). For each municipality, the data con- 
tain information about (internal) migration to and from other Spanish municipalities as well as all 
international migration events. 

The inclusion of an individual in the sample is determined by a sequence in the individual's social 
security number that does not vary across sample vi/aves. This means that individuals are maintained 
across samples. New affiliates with a social security number matching the predetermined sequence 
are added in each new wave. 
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Fig. 2. International immigration and stock of foreign born population in Spain in 
the decade 1999 — 2009. The inset highlights migrant income from specific (main) 
countries of origin. 



documented immigrants. Since then, annual immigration lev- 
els have increased dramatically, reaching a peak in 2006 and 
2007, with inflows exceeding 800, 000 (see light gray bars in 
Figpl. Spain's documented and undocumented foreign born 
population has risen from little more than 1 million to over 
6.5 millions in the analyzed period (see solid line in Fig. [2|. 
Its share of the total population has risen from less than 3% 
to over 13% in the same period. Currently there are immi- 
grants from almost all nations in Spain. However, some 20 
immigrant origins account for approximately 80% of Spain's 
total immigrant population. Immigrants from Romania form 
the largest minority in Spain (767,000 at the end of 2008), 
followed by immigrants from Morocco (737, 000 at the end of 
2008) and Ecuador (479, 000 at the end of 2008). Europe and 
South America together account for over 70% of Spain's total 
immigrant population. 



Data analysis and Elaboration 

We derive two datasets based on the information described 
in the previous section. One contains data on marriages and 
newborns, and the other on labor market affiliation. Both 
datasets contain spatial and temporal information, such as the 
municipal code, quarter, year, and the immigration density in 
the municipalities across time. The data on labor contracts 
consist of 3, 553 entries over the period 2005 — 2010. The data 
on marriages and newborns consist of 27, 144 entries spanning 
the period 1999 — 2008. For the overlapping period (16 quar- 
ters of the 2005-2008 window), the values of 7's match very 
well, which can be seen as a good test of the quality of the 
second sample, since the first dataset is not a sample. 

The quantifiers we study as a function of 7 are defined as: 



rj-n 



Jt = 



M,„ = 



B„ 



# of permanent contracts to immigrants 

# of permanent contracts 

# of temporary contracts to immigrants 

^ of temporary contracts 

# of mixed marriages 

# of marriages 

# of newborns with mixed parents 

^ of newborns 
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We notice that they can be studied equivalently in F by the 
quadratic map of the interval < 7 < 1/2 into < F < 1/4. 



Fig. 3. Upper panel: time series representing the quantifiers and densities 7 
versus quarters. Lower panel: quantifiers versus 7 obtained from time series. 



To investigate the functional dependence of the quanti- 
fiers on 7, we first tested the time series of the parameters 
involved. Figure [S] shows how the average quantifiers and 
density behave in the two databases in terms of the quarters. 
The newborns with mixed parents display a very regular lin- 
ear increase over time. The mixed marriages behave similarly 
but have an added seasonal periodicity. The two labor quan- 
tifiers exhibit a more complex behavior over time. The two 
lower panels show how the density of immigration increases 
over time in the two databases. Using those functions y{t) and 
inverting them in t{'y), we can plot the quantifiers in terms of 
7, thereby obtaining the figures in the lower panel. As we can 
see, apart from a vague functional dependence on the new- 
borns, all of the other quantifiers display erratic behavior and 
escape a functional law. It is evident that the time fiuctua- 
tions through which these graphs are obtained contain spu- 
rious external effects and in addition, when those are absent 
(the newborns case), the two processes of marginalization over 
time and inversion yield a very poor output. The bottom line 
is that the time series approach is not the suitable method for 
obtaining the functional dependence we are looking for since 
it loses relevant information and propagates spurious external 
effects. 

To fully use the rich information of the two database and 
extract from them the functional dependence of the quanti- 
fiers in terms of 7, we first proceed by identifying the empirical 
probability distribution ensemble for each dataset. We do so 




Fig. 4. Density of the marriage and newborn dataset (circles) and of the job 
market dataset (crosses) as a function of 7. In the inset the marriage and newborn 
data density is fitted, for 7 > 0.2, with the power-law behavior (in log-log scale) 
where ^(7) oc 7*, (5 = -3.241 ± 0.024. 



The immigrant densities appear to be power law dis- 
tributed. For the marriages and newborn dataset, as high- 
lighted in the inset, we find for 7 > 0.2 the law ^(7) ~ 7 
with 5 = -3.241. 

Fig. [5] shows the raw data clouds for each quantifier. An 
apparent anomaly is the presence in the lower left panel of 
horizontal lines where the data agglomerate. A further analy- 
sis shows that their values are due to the fractions with small 
denominators, that is, municipalities where the total number 
of marriages within the observed quarter does not exceed the 
few units. The explanation of this anomaly is found in the 
strong cyclical behavior due to seasonal preferences about the 
appropriate time for marriage in Spain (see also Fig. [3|. Peo- 
ple in Spain prefer to marry in the summer rather than in 
the winter, which is unsurprising. Aggregating the data from 
quarters to years would wash away the anomaly. Neverthe- 
less, as we explain below, it turns out that roughening the 
data in this way is not necessary, and it is possible to keep 
the dataset as is, and hence preserve its richness. 
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Fig. 5. Raw data versus 7. Green points represent the fraction of permanent 
job positions held by migrants in a municipality where a percentage 7 of migrants 
is present (apart from restrictions outlined in the introduction, the whole of Spain 
is sampled over the entire analyzed timeframe): similarly orange points account for 
temporary jobs. Further, blue points represent the fraction of mixed marriages, while 
red ones mirror the newborns from mixed parents. One may note that data in the 
left panel seem to lie along horizontal lines displaced according to 1/n, with n E N 
due to seasonal preferences in weddings. See the discussion within the paper. 



by merging into a unique catalogue the data entries in each 
database, regardless of their coordinates in space and time, 
and ordering them by increasing values of 7. The observed 
time windows cover a time scale much larger than the typical 
time scales involved in the dynamics of the jobs market or 
marriages/newborns that we focused on. 

Analysis of their density versus 7 (Fig. ffl shows that, for 
both datasets, only about one percent of the data are found 
for 7 > 0.4. To efficiently model the macroscopic behavior 
of the integration quantifiers with robust statistics, we limit 
our study below that threshold. We also notice that data 
density decreases for small 7. The reason for this is that our 
observation window started when the migration phenomena 
was already running and the density of migrants in Spain was 
larger than zero. 




Fig. 6. Jpi'y)'- Data are represented as spots in each bin with error bars. The 
black line is the best fit of the free theory outcome Jp{'y) = c^7(l — 7) and yields 
the best value Ci? = 1.52it 0.05 with a coefficient of determination i?-^ = 0.985. 
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Fig. 7. Jt{'y): Data are represented as spots in each bin, while lines de- 
note the error bars. The black line is the fit of the free theory outcome with 
Jti'y) = Cp-f{l — 7) and yields the best value Cp = 1.81 it 0.09 with a 
coefficient of determination R = 0.963. 
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Since we are interested in the quantifier's averages as func- 
tions of 7, and since all quantifiers are ratios, there are two 
possible ways of computing the averages. For a given bin of 
7 one can compute 1) the statistical average of the ratios, or 
2) the ratio between the statistical average of numerators and 
the statistical average of the denominators. The first is the 
usual mean of the ratios and the second is their global me- 
diant. As will be explained later the difference in the results 
obtained from the two distinct procedures is in the range of 
0.1 to 0.2 percent (see Fig. |10[ ) and they can consequently be 
considered as effectively equivalent. 

We then proceed by grouping the data into bins over 7 
in which the averages can be evaluated. We found that 15 
to 20 bins optimizes the job market dataset, while 35 to 40 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 



Fig. 8. A/t7i(7): Data are represented as spots in each bin with error bars. The 
black line is the best fit of the free theory outcome Mm{'y) = c^7(l — 7) 
and yields the best value Cjr = 1.18 it 0.07 with a coefficient of determi- 
nation R = 0.855. The blue line is the fit of the interacting theory out- 
come with Mmil) = cj^y{l -7)0(7 - 7c) with cj = 0.53 ± 0.02, 
0.0047 e (0.0034, 0.0064) and with a coefficient of determination 
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Fig. 9. Smi'j)'- Data are represented as spots in each bin, while lines de- 
note the error bars. The black line is the fit of the free theory outcome with 
Bm{'y) = c^7(l — 7) and yields the best value Cp = 0.64 it 0.05 with a coef- 
ficient of determination _R = 0.789. The red line is the fit of the interacting theory 
outcome with Bm('y) = ciy/'y{l — 7)6(7 — 7c) with cj = 0.28 it 0.01, 
7c = 0.0036 e (0.0022, 0.0057) and with a coefficient of determination 
fl2 = 0.984. 



optimizes the dataset on marriages and newborns. For the 
binning criteria, we tested the method of constant informa- 
tion and that of constant bin width. The advantage of the 
first criterion is a constant robustness quality across all bins. 
Clearly, with this approach, the width of the bin will vary over 
7 (their width increases at high values of 7 due to the data 
density decrease reported in Fig. ffl. This can be avoided if 
we instead use constant bin widthT However, with this ap- 
proach the tradeoff is that as 7 increases, the amount of data 
inside each bin may diminish (in particular for 7 > 0.4). As 
we confined our analysis to ymax ~ 0.4 for robustness require- 
ments, the two criteria produce essentially the same results as 
one can see in Fi g. |10[ 

Figures [6] and |7|show the outcome of the average criteria 
and coarse graining procedure for Jp and Jt- The dot's plots 
are made of 17 bins. On each bin the dot represents the av- 
erage value of 200 data, and the vertical bar their standard 
deviation. On each figure the black curve is the free fit, that 
is, the curve of type cf7(1 — 7) best fitting the experimental 
points. Their goodness of fit, reported in the relative captions, 
is estimated as R^j ^.^j ~ 0.985 and ^jj(-y) ~ 0.963. 

Figures [8] and [9] show the results for Mm and Bm- In 
this case, the plots of the dots are made of 38 bins, and 
each of them comes from 700 points. In this case, the free 
fit has a much lower goodness of fit: -Rm„{7) ~ 0.855 and 



R 



S„.(7) 



0.789. In particular, the data show an anoma- 




Flg. 10. Upper panel: Relative errors (blue lines) and absolute errors (green 
lines) as a function of 7 for Aifni'j)- Lower panel: Relative errors (red lines) and 
absolute errors (green lines) of i?T7i(7). For both panel the continuous lines and the 
dash-dotted lines represent the errors made using the mean approach with respect to 
the mediant one with constant step binning and with constant information binning 
respectively. The dotted lines and the dashed lines represent the errors made using 
the constant step binning with respect to the constant information, with the mean 
approach and with the mediant approach, respectively. 



I 



I 



lously high growth rate for small 7 and a low one for large 
7. For this reason we tested another family of curves, whose 
genesis we aim to explain through statistical mechanics in 
the next sections, and ultimately account for interactions 
among persons. Remarkably, all of these curves scale as 
ci\/7(l — 7)^(7^7c) where 9 is the step function. The agree- 
ment of the fit with the experimental data is clearly shown by 
the values Rm^(j) ~ 0.992 and -R|^(^) ~ 0.984. In the above 
formula, one may note the classical exponent one half typ- 
ical of theories that account for imitative interaction. The 
presence (or lack) of a critical value 7c is determined by the 
underlying social network. It is known that in ferromagnetic 
theories network dilution eventually decreases 7c to zero, with- 
out affecting the critical exponent for a large class of social 
topologies [I5]l33][34][26][27][28]p5][29][30]. Accordingly, we 
found empirical values 7c ~ 10~ as reported in the captions 
in Figures Is] and |9] 

The robustness of our findings is tested against averages 
and binning choices in Fig. [To] and discussed in the relative 
caption. Note that the relative errors are expected to increase 
at high values of 7, while in that region they are reduced with 
respect to those for smaller 7 by at least one order of magni- 
tude in all observables. The apparent increase of the relative 
error sizes at small 7 is due instead to a ratio between small 
numbers and is a simple and harmless consequence of nu- 
merical noise as confirmed by the behaviors of their absolute 
values. 

For completeness we then analyze the quantifiers fluctua- 
tions from the averages. For each bin we fix the center of the 
distributions within the average dot and check the relative de- 
viation of each sample data from this reference point. Results 
are reported in Fig. |1H where behaviors are also fitted with 
standard distribution laws as Gaussian, Logistic, Gumbel, and 
Cauchy (the goodnes of fits are reported in the captions) . The 
rescaled distribution for Mm (7) is slightly asymmetric to the 
right: the Gumbel distribution (compared with Logistic and 
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Fig. 11. Left upper panel: rescaled distribution of Jp(7). Fit with standard dis- 
tribution laws are reported: Cauchy distribution R. ^ 0.951, Logistic distribution 
B? = 0.994 and Gaussian distribution li^ = 0.986. Right upper panel: rescaled 
distribution of Jt (7). Fit with standard distribution laws are reported: Cauchy distri- 
bution R = 0.956, Logistic distribution R = 0.996 and Gaussian distribution 
i?2 =0.984. Left lower panel: rescaled distribution of Mm (7)- Fit with standard 
distribution laws are reported: Gumbel distribution R = 0.979, Logistic distri- 
bution R = 0.956 and Gaussian distribution R = 0.941. Right lower panel: 
rescaled distribution of Bm{'y). Fit with standard distribution laws are reported: 
Cauchy distribution R^ = 0.958, Logistic distribution R^ = 0.994 and Gaussian 
distribution R = 0.988. The inset represents the normal probability plot 



Gauss distribution) gives the best fit. The rescaled distribu- 
tion for Jpi'y), Jt{'~f) and Bm{'y) is more symmetric: in these 
cases the Logistic distribution (compared with Cauchy and 
Gauss distribution) yields the best fit. In the insets the nor- 
mal probability plots for the rescaled quantifiers show that 
the tails of the sample distributions are non-Gaussian. 



The Statistical Mechanics Perspective 

In the previous section, we analyzed four classical integration 
quantifiers that describe some type of social coupling like the 
one between the employer and the employee in the job mar- 
ket, or between individuals in a marriage or parenthood. In 
this and the next sections, we provide some theoretical bases 
for these observed phenomena with the help of probabilistic 
models, starting from simple combinatorial methods up to 
techniques that use the modern theory of disordered statisti- 
cal mechanics [5]. 

Let us first analyze the job market. The number of em- 
ployers in each municipality, be they physical persons, asso- 
ciations, or institutions, are proportional to the number of 
natives N„at, that is, proportional to 1 — 7. This is due to 
the fact that the fraction of immigrant employers is negligible 
with respect to the residents. On the other hand, the number 
of immigrant employees is proportional to 7. An elementary 
combinatorial computation predicts a frequency of jobs given 
to immigrants of the following type 



P = cf7(1-7) , 



[81 



where cf is some proportionality constant. This formula pro- 
vides a good fit to the employment data for both permanent 
and temporary jobs, as one can see from figures [6] and [7j and 
can also be obtained by a probabilistic model that reveals its 
general underlying assumptions. By giving each job position 
a two-valued random variable, since the job is given either to 
an immigrant or to a resident, one sees that the main feature 
of models predicting a similar behavior in 7 is the assump- 
tion of the mutual independence of those random variables. 
In other words, the likelihood of giving a job to an immi- 
grant is indifferent to the fact that another job has been given 
to an immigrant or not. Such models, within socioeconomic 
sciences, are all versions of the discrete choice theory by D. 
McFadden |20j , originally devised to predict the use of public 
transportation but nowadays also used in other contexts such 
as occupations, residency locations, etcetera. That method, 
by suitably measuring the parameters of the theory (by polls 
or historical series), can yield quite accurate predictions as it 
did in the Bay Area Rapid Transit problem. They are usually 
parameterized according to the logit (or multi-logit) proba- 
bility distribution. Their predictive success is based on the 
fulfillment of the independence hypothesis. Potential threats 
for its validity are then peer-to-peer effects, belief propaga- 
tion effects, and in general all the situations where individual 
rational choice is perceived as difficult and people instead re- 
sort to imitation or anti- imitation of others. Within statistical 
mechanics, a theory of independent random variables is also 
called a free theory as opposed to an interacting one. The 
discrete choice theory is quite well-suited for policy-making 
for several reasons. First, it is based on empirical data and 
second, the utility function it is built on allows researchers to 
test concrete policy scenarios by varying the free parameters. 
Coming to the other two quantifiers, the mixed marriages 
and newborns, one can see that the same type of free theory 
that provided a good fit for the data in the job market case is 
much less appropriate here. The data show an anomalously 
high growth at small values of 7, largely underestimated by 
the free theory, followed by a crossover, after which the free 
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theory overestimates the quantifier. The free theory errs by a 
fraction of the quantifier by up to 30%. 

The first thing to check is wliether the different nature of 
the random variables can be responsible for the difference in 
social behavior. Social couplings, for instance in marriage, do 
fulfill the interaction rule of monogamy. That is, each individ- 
ual is married only to one other individual, unlike employers 
who may have many employee or an employee who may hold 
more than one job. Similarly, the children of mixed couples do 
not on average exceed small units. All these are indeed forms 
of interaction, but a straightforward computation shows that 
if no other interactions are present the same type of behavior 
of equation ([sl emerges for the probability of mixed marriages 
and newborns as discussed in the last section. 

From figures [S] and |9] a different type of curve provides 
an impressively good fit, which is the square root of the main 
quantity 7(1 — 7) i.e. 



P = c/\/7(l -7) 



[91 



for a suitable proportionality constant ci. That curve carries 
the fingerprint of the mean-field imitation theory of statistical 
mechanics [31] [32], and its match with the phenomenologi- 
cal data, beside being visually manifest, is well shown by the 
coefficients of determination reported in the captions. 

The next section is devoted to filling the gap between the 
empirical laws and the theoretical apparatus of statistical me- 
chanics. We introduce a mathematical model based on indi- 
vidual choices, with an internal structure (microscopic theory) 
whose emergent social behavior (macroscopic theory) repro- 
duces the observed quantifiers. The model reduces to a free 
McFadden theory when peer-to-peer interaction is negligible 




Fig. 12. Left: representation of a possible marriage configuration. Blacl< spots 
represent natives, whiile white ones migrants. Marriages are represented by colored 
lines connecting two points according to whether a particular marriage is among the 
same colors (black line) or among different ones (blue line). The figure represents 
a population of 50 people with 10 immigrants, where 4 couples out of 12 are of 
mixed type. Right: representation of a possible newborn configuration representing 
8 children from mixed couples (red lines) out of a total of 27. Isolated dots are, 
respectively, in left and right figure single people or people without children. Arrows 
represent dichotomic labeling of individuals such that, for the marriages, up-arrows 
stand for people belonging to a mixed marriage (and vice versa for down-arrows), 
while for newborns, up-arrows represent people with children by a mixed couple (and 
vice versa for down-arrows). See Eq.s 15 and 17, which specify variables appearing 
in the Hamiltonians 14, 16. 



and to the Brock-Durlauf theory when imitation is instead the 
dominating factor. 



Monomer-Dimers on Interacting Social Networks 

In this section we want to provide some mathematical details 
for the readers who are closer to the hard sciences. We start 
by characterizing pictorially the possible configurations of the 
two phenomena of mixed marriages and newborns in Fig. |12| 
then we focus on the details of the statistical mechanics for- 
malism. 

To introduce the statistical mechanics model, we assign 
to each person their own tendency to marry versus remaining 
single. In addition, each couple {i,j) has their own likelihood 
to marry or not. Similarly, each person has an individual 
tendency to have children, and each couple too. All of these 
phenomena are then described by individual random variables 
and couple random variables. 

The two observables about marriages and children are of 
course different: for marriages, the monogamy law only allows 
each individual to belong to a single couple. Newborn cou- 
plings instead may not only have multiplicity but individual 
may have children with different partners regardless of being 
married or not. 

All of these rules, from the mathematical point of view, 
turn into topological constraints on the configurational graph 
like the hard-core interaction of monogamy, or probabilistic 
constraints such as the concentration of children per couple 
around small integers. The rule structure can be described 
as follows. Given a set of points 1, ---jN, a configuration of 
marriages M is a set of links among the A'' points with the 
property that no points belong to more than one link (see left 
panel in Fig.(12l). We indicate the unpaired individuals (sin- 
gles) by Sm and the paired ones (married couples) by Cm. 
We call the set of marriage configurations A4. We want to 
describe a system in which we assign to each configuration 
a statistical weight and a partition function built on individ- 
ual random variables as well as on couple random variables. 
Calling Si the weight of the person i in the single state and 



that of the couple {i,j) in the married state (both the c's 
and the s's are positive real numbers), the partition function 
(grancanonical) of the system is given by 



7(M) 



- E n 

MSM (i,j)eCM 






. 1 C j , 



[101 



M 



where the numbers e^j £ {0, 1} are the acquaintance matrix 
elements of the population. Similarly a configuration of filia- 
tions F is a set of links among the N points with the property 
that for a given couple {i,j) (not necessarily married) the 
number of children (links) is distributed according to (say) 
a Poisson distribution p of given average A. The choice of 
the Poisson distribution is the most reasonable, but our con- 
clusions do not depend on the special choice. We indicate 
individuals without children by Uf (undescendent) and the 
couples with children by Pf (parents). We call the set of 
filiations J^. Calling m the weight of the person i in the unde- 
scendent state and pij that of the child {i,j) in the parental 
state (both the it's and the p's are positive real numbers) the 
partition function of the system is then given by 



^'"' = E p(^) n 



Oi,j n ' 



[11] 



the topology of the system, like d-dimensional lattices, or the 
complete graph of N-points or more refined structures like 
Erdos-Renyi and small world graphs. The random variables 
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(c, s,p,u) are taken to be constant on mean field models like 
the one we treat explicitly in this work. Calling Km the to- 
tal number of links in the configuration A/ and defining the 
frequency as i/m ~ Km /{N/2), the expected value of the mar- 
riage frequency can be computed as 



Pm = Av 



z^MeA4 '^'^■^ H(i,j)ecM ''^■J^^'j riigsM ^' 



E. 
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[121 
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where the average operation Av is computed on the acquain- 
tance matrix ensemble. Similarly for the newborn problem, 
calling Kf the total number of links in the configuration F 
and defining the fraction up — Kp/{NX/2), its expected value 



Pf = Av 






[131 



For each population, the previous probability measure pro- 
vides an average value of the two observables. 
We now turn to the theory of bi-populated systems where 
Si and Ui take two values each, depending only on the in- 
dividual being on Imm or Nat and the couple variables 
(cij and Pi,j) take only three values for the three cases 
[Imm, Imm), [Nat, Imm) and {Nat, Nat). We may include 
an imitative (J > 0) interaction between the two populations 
with the introduction of a suitable mean-field Hamiltonian 

I33]El!En]EZl 



where 



H{M) = —Jm ^ eijmaj , 

i^Natjj^Im'm 



+ 1 if i belongs to a mixed marriage 



-1 otherwise , 



and a similar definition for H{F): 



H{F) = -Jp Y. 



i^Nat.j^I'mm 



where 



+ 1 if i has a child within a mixed couple 
— 1 otherwise . 



[14] 



[151 



[161 



[17] 



Note that cr's and r's configurations are uniquely determined 
by monomer-dimer configurations. We point out that the case 
where imitation and anti-imitation coexists leads to a differ- 
ent scenario (see |18| for a case study). The two complete 
partition functions are: 
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[18] 



[19] 



We point out that the introduction of an exponential Hamil- 
tonian deformation of the monomer-dimer model is a working 
hypothesis to be tested against experimental data and it has 
the same significance of the logit distribution assumption in 
the original McFadden discrete choice theory. For a discus- 
sion about its paramount justification in terms of Entropy 
variational principles see |10| and references therein. 



Calling Mm the number of mixed marriages in the con- 
figuration M and defining the frequency of mixed marriages 
fM ~ Mm /Km we have that its expected value, that is, the 
probability of mixed marriages is 
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and analogously the probability of mixed children 
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[21] 

is given in terms of frequency of children from mixed cou- 
ples fp ~ Mp/Kp where Mp is the number of children from 
mixed couples in the configuration F. Although an exact so- 
lution for the general model introduced in this section is not 
yet available, one can still obtain results for a wide variety 
of cases that include the mono-populated and bi-populated 
mean field limits |18| |36| [H7] . The latter shows two regimes 
according to the ratio of J (the coupling Jm and Jp tuning 
the strength of the imitative behavior encoded in the Hamil- 
tonians 14 and 16, hereafter called J for simplicity) and the 
monomer-dimer pressure p = In Z/N. Given the lack of phase 
transitions in the Monomer-Dimer model (see 38, ,39. for a rig- 
orous proof of the non-random mean-field case), we can focus 
on the extreme regimes: the imitative regime J ^ p in which 
the interaction J dominates on the Monomer-Dimer interac- 
tion (hard core or Poisson), and the free regime J <^ p where 
the Monomer-Dimer interaction dominates on the imitative 
one. The structural difference between them is the presence 
of some divergence of the derivative of the P's in formulas |20| 
and [21] 

l^Pi-y) ■ [22] 

In the free regime one finds for the P's a 7 dependence of the 
type 

P(7) = cpfil-j), [23] 

where the constant cp depends only on the a priori probabil- 
ities of the Monomer-Dimer interaction. The expression can 
also be obtained from purely probabilistic (or combinatorial) 
reasoning and always displays a finite derivative in the origin. 
In the imitation regime, on the other hand, the interac- 
tions among agents encoded in the Hamiltonians favor the im- 
itative behavior, while the Monomer-Dimer interaction term, 
accounted for by the adjacency matrix of the reciprocal rela- 
tion, plays the role of a phase-selecting perturbation (the -|- 
phase of the Hamiltonians) similar to a small magnetic field in 
spin models. The mean field Hamiltonian we introduced has a 
size proportional to 7(1 — 7), and for various adjacency matri- 
ces ei,j defining diluted topologies (i.e., random graphs, small 



worlds, etc. 
the type: 



^27) 33 35 40 ), the model predicts a behavior of 



P = cz[7(l-7)]' 



[24] 



where the constant ci, as much as the cp in the free case, de- 
pends only on the a priori probabilities of the Monomer-Dimer 
interaction. The mechanism underlying such behavior is, as 
far as the (social) network is over-percolated [41], and the 
interactions among agents are only imitative, the mean-field 
ferromagnet with the critical exponent one half. Its relevance 
in social sciences has been clearly advocated by Durlauf [42j . 
The critical value of 7 turns out to be very close to zero as 
a consequence of dilution [33], [35) . [4U) and [13]. Due to the 
large ensemble of data we analyzed, we can invoke the Law 
of Large Numbers, which allows us to compare experimental 
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frequencies reported in the first part of the paper, with prob- 
abilities obtained through the statistical mechanics method. 
The theoretical models illustrated reach the precise functional 
behavior of equation |24| in the limit of infinitely many parti- 
cles. Finite-size systems display a round-off effect similar to 
the one we observe here (see Figures jsl ^ . 
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